尽管在整个科学和工程中都无处不在,但只有少数部分微分方程(PDE)具有分析或封闭形式的解决方案。这激发了有关PDE的数值模拟的大量经典工作,最近,对数据驱动技术的研究旋转了机器学习(ML)。最近的一项工作表明,与机器学习的经典数值技术的混合体可以对任何一种方法提供重大改进。在这项工作中,我们表明,在纳入基于物理学的先验时,数值方案的选择至关重要。我们以基于傅立叶的光谱方法为基础,这些光谱方法比其他数值方案要高得多,以模拟使用平滑且周期性解决方案的PDE。具体而言,我们为流体动力学的三个模型PDE开发了ML增强的光谱求解器,从而提高了标准光谱求解器在相同分辨率下的准确性。我们还展示了一些关键设计原则,用于将机器学习和用于解决PDE的数值方法结合使用。
translated by 谷歌翻译
合奏是一种直接,非常有效的方法,用于提高模型在分类任务上的准确性,校准和鲁棒性;然而,其成功基础的原因仍然是研究的积极领域。我们基于PFAU(2013)的偏见变化分解的扩展,以便对分类器合奏的行为产生关键的见解。为了引入偏见变化权衡的双重重新聚集,我们首先得出了典型的分类任务的非对称损失的总期望和差异的广义定律。比较条件和引导偏置/方差估计值,我们表明条件估计必定会导致不可还原误差。接下来,我们表明在双空间中结合会降低差异并使偏差不变,而标准结合可以任意影响偏见。从经验上讲,标准的结合减少偏见,使我们假设分类器的集合可能会出现很好的表现,部分原因是这种意外的减少。我们通过对最近的深度学习方法的经验分析来结束,这些方法是在超级范围上进行整体,这表明这些技术确实有利于降低偏见偏见的偏见偏见。这表明,与经典智慧相反,靶向偏见可能是分类器合奏的有希望的方向。
translated by 谷歌翻译
差异隐私(DP)提供了正式的隐私保证,以防止对手可以访问机器学习模型,从而从提取有关单个培训点的信息。最受欢迎的DP训练方法是差异私有随机梯度下降(DP-SGD),它通过在训练过程中注入噪声来实现这种保护。然而,以前的工作发现,DP-SGD通常会导致标准图像分类基准的性能显着降解。此外,一些作者假设DP-SGD在大型模型上固有地表现不佳,因为保留隐私所需的噪声规范与模型维度成正比。相反,我们证明了过度参数化模型上的DP-SGD可以比以前想象的要好得多。将仔细的超参数调整与简单技术结合起来,以确保信号传播并提高收敛速率,我们获得了新的SOTA,而没有额外数据的CIFAR-10,在81.4%的81.4%下(8,10^{ - 5}) - 使用40 -layer wide-Resnet,比以前的SOTA提高了71.7%。当对预训练的NFNET-F3进行微调时,我们在ImageNet(0.5,8*10^{ - 7})下达到了83.8%的TOP-1精度。此外,我们还在(8,8 \ cdot 10^{ - 7})下达到了86.7%的TOP-1精度,DP仅比当前的非私人SOTA仅4.3%。我们认为,我们的结果是缩小私人图像分类和非私有图像分类之间准确性差距的重要一步。
translated by 谷歌翻译
贝叶斯优化(BO)是用于全局优化昂贵的黑盒功能的流行范式,但是在许多域中,该函数并不完全是黑色框。数据可能具有一些已知的结构(例如对称性)和/或数据生成过程可能是一个复合过程,除优化目标的值外,还可以产生有用的中间或辅助信息。但是,传统上使用的代孕模型,例如高斯工艺(GPS),随数据集大小的规模较差,并且不容易适应已知的结构。取而代之的是,我们使用贝叶斯神经网络,这是具有感应偏见的一类可扩展和灵活的替代模型,将BO扩展到具有高维度的复杂,结构化问题。我们证明了BO在物理和化学方面的许多现实问题,包括使用卷积神经网络对光子晶体材料进行拓扑优化,以及使用图神经网络对分子进行化学性质优化。在这些复杂的任务上,我们表明,就抽样效率和计算成本而言,神经网络通常优于GP作为BO的替代模型。
translated by 谷歌翻译
High content imaging assays can capture rich phenotypic response data for large sets of compound treatments, aiding in the characterization and discovery of novel drugs. However, extracting representative features from high content images that can capture subtle nuances in phenotypes remains challenging. The lack of high-quality labels makes it difficult to achieve satisfactory results with supervised deep learning. Self-Supervised learning methods, which learn from automatically generated labels has shown great success on natural images, offer an attractive alternative also to microscopy images. However, we find that self-supervised learning techniques underperform on high content imaging assays. One challenge is the undesirable domain shifts present in the data known as batch effects, which may be caused by biological noise or uncontrolled experimental conditions. To this end, we introduce Cross-Domain Consistency Learning (CDCL), a novel approach that is able to learn in the presence of batch effects. CDCL enforces the learning of biological similarities while disregarding undesirable batch-specific signals, which leads to more useful and versatile representations. These features are organised according to their morphological changes and are more useful for downstream tasks - such as distinguishing treatments and mode of action.
translated by 谷歌翻译
Machine learning methods have seen increased application to geospatial environmental problems, such as precipitation nowcasting, haze forecasting, and crop yield prediction. However, many of the machine learning methods applied to mosquito population and disease forecasting do not inherently take into account the underlying spatial structure of the given data. In our work, we apply a spatially aware graph neural network model consisting of GraphSAGE layers to forecast the presence of West Nile virus in Illinois, to aid mosquito surveillance and abatement efforts within the state. More generally, we show that graph neural networks applied to irregularly sampled geospatial data can exceed the performance of a range of baseline methods including logistic regression, XGBoost, and fully-connected neural networks.
translated by 谷歌翻译
Large "instruction-tuned" language models (finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We introduce Self-Instruct, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off its own generations. Our pipeline generates instruction, input, and output samples from a language model, then prunes them before using them to finetune the original model. Applying our method to vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT_001, which is trained with private user data and human annotations. For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin, leaving only a 5% absolute gap behind InstructGPT_001. Self-Instruct provides an almost annotation-free method for aligning pre-trained language models with instructions, and we release our large synthetic dataset to facilitate future studies on instruction tuning.
translated by 谷歌翻译
While dense retrieval has been shown effective and efficient across tasks and languages, it remains difficult to create effective fully zero-shot dense retrieval systems when no relevance label is available. In this paper, we recognize the difficulty of zero-shot learning and encoding relevance. Instead, we propose to pivot through Hypothetical Document Embeddings~(HyDE). Given a query, HyDE first zero-shot instructs an instruction-following language model (e.g. InstructGPT) to generate a hypothetical document. The document captures relevance patterns but is unreal and may contain false details. Then, an unsupervised contrastively learned encoder~(e.g. Contriever) encodes the document into an embedding vector. This vector identifies a neighborhood in the corpus embedding space, where similar real documents are retrieved based on vector similarity. This second step ground the generated document to the actual corpus, with the encoder's dense bottleneck filtering out the incorrect details. Our experiments show that HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever and shows strong performance comparable to fine-tuned retrievers, across various tasks (e.g. web search, QA, fact verification) and languages~(e.g. sw, ko, ja).
translated by 谷歌翻译
Recent times have witnessed an increasing number of applications of deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, question answering (such as ChatGPT), etc. Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills? To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset, for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children in the 6-8 age group. Our dataset consists of 101 unique puzzles; each puzzle comprises a picture and a question, and their solution needs a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning, among others. To scale our dataset towards training deep neural networks, we programmatically generate entirely new instances for each puzzle while retaining their solution algorithm. To benchmark the performance on the SMART-101 dataset, we propose a vision and language meta-learning model using varied state-of-the-art backbone neural networks. Our experiments reveal that while powerful deep models offer reasonable performances on puzzles that they are trained on, they are not better than random accuracy when analyzed for generalization. We also evaluate the recent ChatGPT large language model on a subset of our dataset and find that while ChatGPT produces convincing reasoning abilities, the answers are often incorrect.
translated by 谷歌翻译
We introduce INSTRUCTOR, a new method for computing text embeddings given task instructions: every text input is embedded together with instructions explaining the use case (e.g., task and domain descriptions). Unlike encoders from prior work that are more specialized, INSTRUCTOR is a single embedder that can generate text embeddings tailored to different downstream tasks and domains, without any further training. We first annotate instructions for 330 diverse tasks and train INSTRUCTOR on this multitask mixture with a contrastive loss. We evaluate INSTRUCTOR on 70 embedding evaluation tasks (66 of which are unseen during training), ranging from classification and information retrieval to semantic textual similarity and text generation evaluation. INSTRUCTOR, while having an order of magnitude fewer parameters than the previous best model, achieves state-of-the-art performance, with an average improvement of 3.4% compared to the previous best results on the 70 diverse datasets. Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets.
translated by 谷歌翻译